MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.
Identifieur interne : 000901 ( Main/Exploration ); précédent : 000900; suivant : 000902MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.
Auteurs : Kévin Vervier [États-Unis] ; Pierre Mahé [France] ; Jean-Philippe Vert [France]Source :
- Methods in molecular biology (Clifton, N.J.) [ 1940-6029 ] ; 2018.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
Abstract
Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.
DOI: 10.1007/978-1-4939-8561-6_2
PubMed: 30030800
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 000830
- to stream PubMed, to step Curation: 000830
- to stream PubMed, to step Checkpoint: 000848
- to stream Ncbi, to step Merge: 001F08
- to stream Ncbi, to step Curation: 001F08
- to stream Ncbi, to step Checkpoint: 001F08
- to stream Main, to step Merge: 000904
- to stream Main, to step Curation: 000901
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.</title>
<author><name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA</wicri:regionArea>
<placeName><region type="state">Iowa</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation wicri:level="1"><nlm:affiliation>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile, France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile</wicri:regionArea>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
<affiliation wicri:level="3"><nlm:affiliation>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau, France. Jean-Philippe.Vert@mines-paristech.fr.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau</wicri:regionArea>
<placeName><region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Fontainebleau</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:30030800</idno>
<idno type="pmid">30030800</idno>
<idno type="doi">10.1007/978-1-4939-8561-6_2</idno>
<idno type="wicri:Area/PubMed/Corpus">000830</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000830</idno>
<idno type="wicri:Area/PubMed/Curation">000830</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000830</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000848</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000848</idno>
<idno type="wicri:Area/Ncbi/Merge">001F08</idno>
<idno type="wicri:Area/Ncbi/Curation">001F08</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001F08</idno>
<idno type="wicri:Area/Main/Merge">000904</idno>
<idno type="wicri:Area/Main/Curation">000901</idno>
<idno type="wicri:Area/Main/Exploration">000901</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.</title>
<author><name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA</wicri:regionArea>
<placeName><region type="state">Iowa</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation wicri:level="1"><nlm:affiliation>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile, France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile</wicri:regionArea>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
<affiliation wicri:level="3"><nlm:affiliation>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau, France. Jean-Philippe.Vert@mines-paristech.fr.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau</wicri:regionArea>
<placeName><region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Fontainebleau</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Methods in molecular biology (Clifton, N.J.)</title>
<idno type="eISSN">1940-6029</idno>
<imprint><date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Base Sequence</term>
<term>Calibration</term>
<term>Genome, Bacterial</term>
<term>Machine Learning</term>
<term>Metagenomics (methods)</term>
<term>Reproducibility of Results</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Analyse de séquence d'ADN</term>
<term>Apprentissage machine</term>
<term>Calibrage</term>
<term>Génome bactérien</term>
<term>Logiciel</term>
<term>Métagénomique ()</term>
<term>Reproductibilité des résultats</term>
<term>Séquence nucléotidique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Metagenomics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Base Sequence</term>
<term>Calibration</term>
<term>Genome, Bacterial</term>
<term>Machine Learning</term>
<term>Reproducibility of Results</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Analyse de séquence d'ADN</term>
<term>Apprentissage machine</term>
<term>Calibrage</term>
<term>Génome bactérien</term>
<term>Logiciel</term>
<term>Métagénomique</term>
<term>Reproductibilité des résultats</term>
<term>Séquence nucléotidique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
<li>États-Unis</li>
</country>
<region><li>Iowa</li>
<li>Île-de-France</li>
</region>
<settlement><li>Fontainebleau</li>
</settlement>
</list>
<tree><country name="États-Unis"><region name="Iowa"><name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
</region>
</country>
<country name="France"><noRegion><name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
</noRegion>
<name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000901 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000901 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:30030800 |texte= MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:30030800" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |